Skip to main content

Wide Crawls

Wide crawls of the Internet conducted by Internet Archive. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.

285,524
RESULTS


collections 18
web 285,503
data 3

PART OF
Internet Archive Web Crawls
Web Crawls

TOPIC
crawldata 285,503
amazonbooks 249
LANGUAGE
SHOW DETAILS
Title
Date Archived
Creator
Wide Crawl started April 2013
24,879
ITEMS
185.8M
VIEWS
185.8M
Web wide crawl with initial seedlist and crawler configuration from April 2013.
Wide Crawl started January 2012
28,371
ITEMS
118M
VIEWS
118M
Web wide crawl with initial seedlist and crawler configuration from January 2012 using HQ software.
Wide Crawl started April 2012
38,814
ITEMS
112.9M
VIEWS
112.9M
Web wide crawl with initial seedlist and crawler configuration from April 2012.
Wide Crawl started August 2013
21,700
ITEMS
109.7M
VIEWS
109.7M
Web wide crawl with initial seedlist and crawler configuration from August 2013.
Wide Crawl started October 2010
15,222
ITEMS
80.5M
VIEWS
80.5M
Web wide crawl with initial seedlist and crawler configuration from October 2010
Wide Crawl Started January 2013
14,975
ITEMS
79.7M
VIEWS
79.7M
Wide crawls of the Internet conducted by Internet Archive. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
79M
Web wide crawl with initial seedlist and crawler configuration from September 2012.
Wide Crawl started October 2011
11,867
ITEMS
68.2M
VIEWS
68.2M
Web wide crawl with initial seedlist and crawler configuration from March 2011 using HQ software.
Wide Crawl started June 2014
45,309
ITEMS
62.6M
VIEWS
62.6M
Web wide crawl with initial seedlist and crawler configuration from June 2014.
Wide Crawl started March 2011
8,178
ITEMS
61.5M
VIEWS
61.5M
Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi. What’s in the data set: Crawl start date: 09 March, 2011 Crawl end date: 23 December, 2011 Number of captures: 2,713,676,341 Number of unique URLs: 2,273,840,159 Number of hosts: 29,032,069 The seed list for this crawl was a list of Alexa’s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix...
Wide Crawl started February 2014
9,611
ITEMS
30.9M
VIEWS
30.9M
Web wide crawl with initial seedlist and crawler configuration from February 2014.
Wide Crawl started January 2015
35,084
ITEMS
5.3M
VIEWS
5.3M
Web wide crawl with initial seedlist and crawler configuration from January 2015.
Internet Archive crawldata from Webwide Crawl, captured by crawl423.us.archive.org:wide from Tue Jan 17 08:02:53 PST 2012 to Tue Jan 17 01:16:20 PST 2012.
Topic: crawldata
2.1M
Web wide crawl with initial seedlist and crawler configuration from September 2010
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Sat Jan 21 04:01:50 PST 2012 to Fri Jan 20 21:01:34 PST 2012.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl415.us.archive.org:wide from Wed May 15 12:25:51 PDT 2013 to Wed May 15 06:56:55 PDT 2013.
Topic: crawldata
Wide Crawl started April 2013
904,951
0
0
Internet Archive crawldata from Webwide Crawl, captured by crawl424.us.archive.org:wide from Wed Jul 24 23:32:46 PDT 2013 to Wed Jul 24 18:16:50 PDT 2013.
Topic: crawldata
Internet Archive crawldata from all sites, captured by ia360919.us.archive.org:wide from Fri Sep 24 20:27:19 UTC 2010 to Sat Sep 25 04:26:09 UTC 2010.
Topic: crawldata
Wide Crawl started April 2013
577,632
0
0
Internet Archive crawldata from Webwide Crawl, captured by crawl417.us.archive.org:wide from Wed May 15 12:43:53 PDT 2013 to Wed May 15 07:09:54 PDT 2013.
Topic: crawldata
Wide Crawl started April 2013
572,719
0
0
Internet Archive crawldata from Webwide Crawl, captured by crawl415.us.archive.org:wide from Wed May 15 13:19:48 PDT 2013 to Wed May 15 07:48:12 PDT 2013.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl337.us.archive.org:wide from Wed Oct 17 08:14:47 PDT 2012 to Wed Oct 17 02:41:59 PDT 2012.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl339.us.archive.org:wide from Fri Oct 19 01:17:49 PDT 2012 to Thu Oct 18 21:38:24 PDT 2012.
Topic: crawldata
Wide Crawl started April 2013
388,088
0
0
Internet Archive crawldata from Webwide Crawl, captured by crawl419.us.archive.org:wide from Wed May 15 12:34:32 PDT 2013 to Wed May 15 07:17:30 PDT 2013.
Topic: crawldata
Wide Crawl started April 2013
381,411
0
0
Internet Archive crawldata from Webwide Crawl, captured by crawl419.us.archive.org:wide from Wed May 15 13:33:28 PDT 2013 to Wed May 15 07:56:18 PDT 2013.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 18:18:33 PST 2012 to Thu Jan 5 11:30:47 PST 2012.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 20:28:12 PST 2012 to Thu Jan 5 13:25:29 PST 2012.
Topic: crawldata
Internet Archive crawldata from all sites, captured by ia360919.us.archive.org:wide from Sat Sep 25 20:13:06 UTC 2010 to Sun Sep 26 06:01:07 UTC 2010.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl339.us.archive.org:wide from Wed Jan 30 00:37:31 PST 2013 to Tue Jan 29 21:10:11 PST 2013.
Topic: crawldata
Host Screen Captures
8,938
ITEMS
315,613
VIEWS
315,613
Screen captures of hosts discovered during wide crawls. This data is currently not publicly accessible.
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 16:52:35 PST 2012 to Thu Jan 5 10:17:00 PST 2012.
Topic: crawldata
Internet Archive crawldata from all sites, captured by ia360919.us.archive.org:wide from Fri Sep 24 19:56:41 UTC 2010 to Sat Sep 25 03:53:54 UTC 2010.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Fri Jun 8 07:28:07 PDT 2012 to Fri Jun 8 02:31:40 PDT 2012.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl336.us.archive.org:wide from Wed Jan 18 17:01:06 PST 2012 to Wed Jan 18 09:17:47 PST 2012.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 15:53:45 PST 2012 to Thu Jan 5 08:51:54 PST 2012.
Topic: crawldata
Internet Archive crawldata from all sites, captured by ia360907.us.archive.org:wide from Thu Nov 25 14:37:09 UTC 2010 to Thu Nov 25 18:13:57 UTC 2010.
Topic: crawldata
Internet Archive crawldata from all sites, captured by ia360907.us.archive.org:wide from Thu Nov 25 14:05:25 UTC 2010 to Thu Nov 25 17:38:12 UTC 2010.
Topic: crawldata
Internet Archive crawldata from all sites, captured by ia360919.us.archive.org:wide from Sat Sep 25 04:26:09 UTC 2010 to Sat Sep 25 13:42:28 UTC 2010.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 19:32:22 PST 2012 to Thu Jan 5 12:20:54 PST 2012.
Topic: crawldata
Internet Archive crawldata from all sites, captured by ia360919.us.archive.org:wide from Sat Sep 25 03:53:54 UTC 2010 to Sat Sep 25 12:53:45 UTC 2010.
Topic: crawldata
Wide Crawl started March 2011
215,792
0
0
Internet Archive crawldata from Webwide Crawl, captured by crawl416.us.archive.org:wide from Thu Jun 23 21:11:43 PDT 2011 to Thu Jun 23 14:53:15 PDT 2011.
Topic: crawldata
Wide Crawl started March 2011
212,857
0
0
Internet Archive crawldata from Webwide Crawl, captured by crawl415.us.archive.org:wide from Sun Aug 14 03:13:23 PDT 2011 to Sat Aug 13 22:20:18 PDT 2011.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl417.us.archive.org:wide from Tue May 7 13:13:34 PDT 2013 to Tue May 7 07:18:32 PDT 2013.
Topic: crawldata
Internet Archive crawldata from all sites, captured by ia360919.us.archive.org:wide from Sat Sep 25 13:42:28 UTC 2010 to Sat Sep 25 21:24:31 UTC 2010.
Topic: crawldata
Wide Crawl started April 2012
207,049
0
0
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Fri May 25 03:27:07 PDT 2012 to Thu May 24 22:54:59 PDT 2012.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Wed Jan 11 15:00:58 PST 2012 to Wed Jan 11 07:47:03 PST 2012.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Tue Apr 17 00:58:46 PDT 2012 to Mon Apr 16 20:37:31 PDT 2012.
Topic: crawldata
Wide Crawl started April 2013
197,412
0
0
Internet Archive crawldata from Webwide Crawl, captured by crawl417.us.archive.org:wide from Wed May 15 13:36:11 PDT 2013 to Wed May 15 07:55:53 PDT 2013.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Thu Jun 7 23:15:28 PDT 2012 to Thu Jun 7 19:17:03 PDT 2012.
Topic: crawldata
Internet Archive crawldata from all sites, captured by ia360919.us.archive.org:wide from Sat Sep 25 21:24:33 UTC 2010 to Sun Sep 26 07:48:58 UTC 2010.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Jan 11 14:29:43 PST 2012 to Wed Jan 11 07:51:53 PST 2012.
Topic: crawldata
Wide Crawl started April 2012
175,111
0
0
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Fri May 25 11:48:37 PDT 2012 to Fri May 25 07:42:57 PDT 2012.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Wed Oct 24 12:05:49 PDT 2012 to Wed Oct 24 07:00:10 PDT 2012.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 14:54:11 PST 2012 to Thu Jan 5 07:47:32 PST 2012.
Topic: crawldata
Wide Crawl started April 2013
163,878
0
0
Internet Archive crawldata from Webwide Crawl, captured by crawl450.us.archive.org:wideaux from Sat Apr 20 23:32:13 PDT 2013 to Sat Apr 20 21:46:22 PDT 2013.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl425.us.archive.org:wide from Wed Nov 14 17:39:11 PST 2012 to Wed Nov 14 10:55:11 PST 2012.
Topic: crawldata
Wide Crawl started April 2013
151,062
0
0
Internet Archive crawldata from Webwide Crawl, captured by crawl450.us.archive.org:wideaux from Mon May 13 19:16:42 PDT 2013 to Mon May 13 14:17:35 PDT 2013.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl339.us.archive.org:wide from Wed May 2 08:00:09 PDT 2012 to Wed May 2 01:59:58 PDT 2012.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl339.us.archive.org:wide from Mon Oct 15 01:44:34 PDT 2012 to Sun Oct 14 20:06:53 PDT 2012.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl339.us.archive.org:wide from Tue Oct 2 20:12:23 PDT 2012 to Tue Oct 2 17:20:31 PDT 2012.
Topic: crawldata
Internet Archive crawldata from all sites, captured by ia360924.us.archive.org:wide from Thu Oct 28 21:17:04 UTC 2010 to Fri Oct 29 05:08:02 UTC 2010.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl339.us.archive.org:wide from Tue Jan 29 22:00:06 PST 2013 to Tue Jan 29 18:12:37 PST 2013.
Topic: crawldata
Wide Crawl started April 2013
135,312
0
0
Internet Archive crawldata from Webwide Crawl, captured by crawl415.us.archive.org:wide from Wed Jun 12 05:21:53 PDT 2013 to Wed Jun 12 00:00:20 PDT 2013.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl338.us.archive.org:wide from Sat Feb 22 06:42:19 PST 2014 to Sat Feb 22 01:03:35 PST 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl419.us.archive.org:wide from Tue Apr 2 19:47:01 PDT 2013 to Tue Apr 2 15:16:59 PDT 2013.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 21:32:46 PST 2012 to Thu Jan 5 14:25:07 PST 2012.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl339.us.archive.org:wide from Mon Oct 15 22:52:58 PDT 2012 to Mon Oct 15 18:36:41 PDT 2012.
Topic: crawldata
Wide Crawl started April 2013
128,521
0
0
Internet Archive crawldata from Webwide Crawl, captured by crawl419.us.archive.org:wide from Wed May 15 09:24:17 PDT 2013 to Wed May 15 04:10:53 PDT 2013.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl416.us.archive.org:wide from Thu May 3 12:44:34 PDT 2012 to Thu May 3 07:11:32 PDT 2012.
Topic: crawldata
Wide Crawl started April 2013
127,581
0
0
Internet Archive crawldata from Webwide Crawl, captured by crawl417.us.archive.org:wide from Wed May 15 09:49:58 PDT 2013 to Wed May 15 04:09:39 PDT 2013.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl423.us.archive.org:wide from Tue Oct 16 07:37:59 PDT 2012 to Tue Oct 16 03:14:49 PDT 2012.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl448.us.archive.org:argov from Wed Aug 24 01:32:00 PDT 2011 to Thu Sep 1 12:02:59 PDT 2011.
Topic: crawldata
Wide Crawl started April 2013
127,196
0
0
Internet Archive crawldata from Webwide Crawl, captured by crawl415.us.archive.org:wide from Tue Apr 23 12:43:43 PDT 2013 to Tue Apr 23 07:24:39 PDT 2013.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Mon Jun 3 09:45:27 PDT 2013 to Mon Jun 3 05:04:44 PDT 2013.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl428.us.archive.org:wide from Sat Aug 10 08:17:48 PDT 2013 to Sat Aug 10 03:21:13 PDT 2013.
Topic: crawldata
Wide Crawl started April 2012
125,422
0
0
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Sun May 13 07:09:39 PDT 2012 to Sun May 13 01:49:20 PDT 2012.
Topic: crawldata